智能论文笔记

"Are you okay, honey?": Recognizing Emotions among Couples Managing Diabetes in Daily Life using Multimodal Real-World Smartwatch Data

George Boateng , Prabhakaran Santhanam , Elgar Fleisch , Janina Lüscher , Theresa Pauly , Urte Scholz , Guy Bodenmann , Tobias Kowatsch

分类：自然语言处理

2022-08-16

夫妻通常在一起管理慢性疾病，管理层对患者及其浪漫伴侣造成了情感上的伤害。因此，认识到日常生活中每个伴侣的情绪可以提供对他们在慢性疾病管理中的情感健康的见解。当前，评估每个伴侣的情绪的过程是手动，时间密集和昂贵的。尽管夫妻之间存在着关于情感识别的作品，但这些作品都没有使用夫妻在日常生活中的互动中收集的数据。在这项工作中，我们收集了85小时（1,021个5分钟样本）现实世界多模式智能手表传感器数据（语音，心率，加速度计和陀螺仪）和自我报告的情绪数据（n = 612）（13个伙伴）（13）夫妻）在日常生活中管理2型糖尿病。我们提取了生理，运动，声学和语言特征，以及训练有素的机器学习模型（支持向量机和随机森林），以识别每个伴侣的自我报告的情绪（价和唤醒）。我们最佳模型的结果比偶然的结果更好，唤醒和价值分别为63.8％和78.1％。这项工作有助于建立自动情绪识别系统，最终使伙伴能够监视他们在日常生活中的情绪，并能够提供干预措施以改善其情感幸福感。

translated by 谷歌翻译

ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports

Katharina Jeblick , Balthasar Schachtner , Jakob Dexl , Andreas Mittermeier , Anna Theresa Stüber , Johanna Topalis , Tobias Weber , Philipp Wesp , Bastian Sabel , Jens Ricke

分类：自然语言处理 | 机器学习

2022-12-30

The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.

translated by 谷歌翻译

Hyperparameters in Contextual RL are Highly Situational

Theresa Eimer , Carolin Benjamins , Marius Lindauer

分类：机器学习

2022-12-21

Although Reinforcement Learning (RL) has shown impressive results in games and simulation, real-world application of RL suffers from its instability under changing environment conditions and hyperparameters. We give a first impression of the extent of this instability by showing that the hyperparameters found by automatic hyperparameter optimization (HPO) methods are not only dependent on the problem at hand, but even on how well the state describes the environment dynamics. Specifically, we show that agents in contextual RL require different hyperparameters if they are shown how environmental factors change. In addition, finding adequate hyperparameter configurations is not equally easy for both settings, further highlighting the need for research into how hyperparameters influence learning and generalization in RL.

translated by 谷歌翻译

Disentangling Prosody Representations with Unsupervised Speech Reconstruction

Leyuan Qu , Taihao Li , Cornelius Weber , Theresa Pekarek-Rosin , Fuji Ren , Stefan Wermter

分类：自然语言处理

2022-12-14

Human speech can be characterized by different components, including semantic content, speaker identity and prosodic information. Significant progress has been made in disentangling representations for semantic content and speaker identity in Automatic Speech Recognition (ASR) and speaker verification tasks respectively. However, it is still an open challenging research question to extract prosodic information because of the intrinsic association of different attributes, such as timbre and rhythm, and because of the need for unsupervised training schemes to achieve robust large-scale and speaker-independent ASR. The aim of this paper is to address the disentanglement of emotional prosody from speech based on unsupervised reconstruction. Specifically, we identify, design, implement and integrate three crucial components in our proposed speech reconstruction model Prosody2Vec: (1) a unit encoder that transforms speech signals into discrete units for semantic content, (2) a pretrained speaker verification model to generate speaker identity embeddings, and (3) a trainable prosody encoder to learn prosody representations. We first pretrain the Prosody2Vec representations on unlabelled emotional speech corpora, then fine-tune the model on specific datasets to perform Speech Emotion Recognition (SER) and Emotional Voice Conversion (EVC) tasks. Both objective and subjective evaluations on the EVC task suggest that Prosody2Vec effectively captures general prosodic features that can be smoothly transferred to other emotional speech. In addition, our SER experiments on the IEMOCAP dataset reveal that the prosody features learned by Prosody2Vec are complementary and beneficial for the performance of widely used speech pretraining models and surpass the state-of-the-art methods when combining Prosody2Vec with HuBERT representations. Some audio samples can be found on our demo website.

translated by 谷歌翻译

Automated MRI Field of View Prescription from Region of Interest Prediction by Intra-stack Attention Neural Network

Ke Lei , Ali B. Syed , Xucheng Zhu , John M. Pauly , Shreyas S. Vasanawala

分类：计算机视觉

2022-11-09

Manual prescription of the field of view (FOV) by MRI technologists is variable and prolongs the scanning process. Often, the FOV is too large or crops critical anatomy. We propose a deep-learning framework, trained by radiologists' supervision, for automating FOV prescription. An intra-stack shared feature extraction network and an attention network are used to process a stack of 2D image inputs to generate output scalars defining the location of a rectangular region of interest (ROI). The attention mechanism is used to make the model focus on the small number of informative slices in a stack. Then the smallest FOV that makes the neural network predicted ROI free of aliasing is calculated by an algebraic operation derived from MR sampling theory. We retrospectively collected 595 cases between February 2018 and February 2022. The framework's performance is examined quantitatively with intersection over union (IoU) and pixel error on position, and qualitatively with a reader study. We use the t-test for comparing quantitative results from all models and a radiologist. The proposed model achieves an average IoU of 0.867 and average ROI position error of 9.06 out of 512 pixels on 80 test cases, significantly better (P<0.05) than two baseline models and not significantly different from a radiologist (P>0.12). Finally, the FOV given by the proposed framework achieves an acceptance rate of 92% from an experienced radiologist.

translated by 谷歌翻译

Error Mitigation-Aided Optimization of Parameterized Quantum Circuits: Convergence Analysis

Sharu Theresa Jose , Osvaldo Simeone

分类：机器学习

2022-09-23

变异量子算法（VQAS）为通过嘈杂的中间规模量子（NISQ）处理器获得量子优势提供了最有希望的途径。这样的系统利用经典优化来调整参数化量子电路（PQC）的参数。目标是最大程度地减少取决于从PQC获得的测量输出的成本函数。通常通过随机梯度下降（SGD）实现优化。在NISQ计算机上，由于缺陷和破坏性而引起的栅极噪声通过引入偏差会影响随机梯度的估计。量子误差缓解（QEM）技术可以减少估计偏差而无需量子数量增加，但它们又导致梯度估计的方差增加。这项工作研究了量子门噪声对SGD收敛的影响，而VQA的基本实例是变异的eigensolver（VQE）。主要目标是确定QEM可以增强VQE的SGD性能的条件。结果表明，量子门噪声在SGD的收敛误差（根据参考无噪声PQC评估）诱导非零误差 - 基础，这取决于噪声门的数量，噪声的强度以及可观察到的可观察到的特征性被测量和最小化。相反，使用QEM，可以获得任何任意小的误差。此外，对于有或没有QEM的误差级别，QEM可以减少所需的迭代次数，但是只要量子噪声水平足够小，并且在每种SGD迭代中允许足够大的测量值。最大切割问题的数值示例证实了主要理论发现。

translated by 谷歌翻译

Lessons from a Space Lab -- An Image Acquisition Perspective

Leo Pauly , Michele Lynn Jamrozik , Miguel Ortiz Del Castillo , Olivia Borgue , Inder Pal Singh , Mohatashem Reyaz Makhdoomi , Olga-Orsalia Christidi-Loumpasefski , Vincent Gaudilliere , Carol Martinez , Arunkumar Rathinam

分类：计算机视觉

2022-08-18

近年来，深度学习（DL）算法的使用改善了基于视觉的空间应用的性能。但是，生成大量的注释数据来培训这些DL算法已被证明具有挑战性。虽然可以使用合成生成的图像，但在实际环境中测试时，经过合成数据训练的DL模型通常容易受到性能降解。在这种情况下，卢森堡大学的安全，可靠性和信任（SNT）跨学科中心开发了“ SNT Zero-G Lab”，用于在模拟现实世界太空环境的条件下培训和验证基于视觉的空间算法。 SNT Zero-G实验室开发的一个重要方面是设备选择。从实验室开发过程中学到的经验教训，本文提出了一种系统的方法，将市场调查和设备选择的实验分析结合在一起。特别是，本文专注于太空实验室中的图像采集设备：背景材料，相机和照明灯。实验分析的结果表明，在太空实验室开发项目中选择有效的设备选择需要通过实验分析来称赞的市场调查。

translated by 谷歌翻译

A knee cannot have lung disease: out-of-distribution detection with in-distribution voting using the medical example of chest X-ray classification

Alessandro Wollek , Theresa Willem , Michael Ingrisch , Bastian Sabel , Tobias Lasser

分类：计算机视觉

2022-08-01

深度学习模型正在应用于越来越多的成功案例中，但是他们在现实世界中的表现如何？为了测试模型，组装了特定的清洁数据集。但是，当部署在现实世界中时，该模型将面临意外的分布（OOD）数据。在这项工作中，我们表明所谓的“放射科医生级” Chexnet模型未能识别所有OOD图像，并将其归类为肺部疾病。为了解决这个问题，我们提出了分发投票，这是一种对多标签分类的分布图像进行分类的新方法。使用在ID和OOD数据上训练的独立课程分布（ID）预测指标，我们平均达到99％的ID分类特异性和98％的敏感性，与胸部上以前的作品相比，端到端的性能显着提高X射线14个数据集。即使仅用ImageNet作为OOD数据训练并使用X射线OOD图像进行测试，我们的方法即使仅用Imagenet进行训练，也超过了其他基于输出的OOD检测器。

translated by 谷歌翻译

GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction

Batu Ozturkler , Arda Sahiner , Tolga Ergen , Arjun D Desai , Christopher M Sandino , Shreyas Vasanawala , John M Pauly , Morteza Mardani , Mert Pilanci

分类：计算机视觉

2022-07-18

展开的神经网络最近实现了最先进的MRI重建。这些网络通过在基于物理的一致性和基于神经网络的正则化之间交替来展开迭代优化算法。但是，它们需要大型神经网络的几次迭代来处理高维成像任务，例如3D MRI。这限制了基于反向传播的传统训练算法，这是由于较大的记忆力和计算梯度和存储中间激活的计算要求。为了应对这一挑战，我们提出了加速MRI（GLEAM）重建的贪婪学习，这是一种高维成像设置的有效培训策略。 GLEAM将端到端网络拆分为脱钩的网络模块。每个模块都以贪婪的方式优化，并通过脱钩的梯度更新，从而减少了训练过程中的内存足迹。我们表明，可以在多个图形处理单元（GPU）上并行执行解耦梯度更新，以进一步减少训练时间。我们介绍了2D和3D数据集的实验，包括多线圈膝，大脑和动态心脏Cine MRI。我们观察到：i）闪闪发光的概括以及最先进的记忆效率基线，例如具有相同内存足迹的梯度检查点和可逆网络，但训练速度更快1.3倍； ii）对于相同的内存足迹，闪光在2D中产生1.1dB PSNR的增益，而3D在端到端基线中产生1.8 dB。

translated by 谷歌翻译

UserLibri: A Dataset for ASR Personalization Using Only Text

Theresa Breiner , Swaroop Ramaswamy , Ehsan Variani , Shefali Garg , Rajiv Mathews , Khe Chai Sim , Kilol Gupta , Mingqing Chen , Lara McConnaughey

分类：自然语言处理 | 机器学习

2022-07-02

在移动设备上的语音模型（在设备个性化）上的个性化是一个活跃的研究领域，但是通常，移动设备比配对的音频文本数据具有更多的仅文本数据。我们探索培训有关仅文本数据的个性化语言模型，该模型在推理期间用于提高该用户的语音识别性能。我们在一个用户群体的Librispeech语料库上进行了实验，并为Gutenberg Project的每个用户提供了个性化的文本数据。我们发布此特定于用户的LibrisPeech（UserLibri）数据集，以帮助未来的个性化研究。LibrisPeech音频转录对分为来自测试清洁数据集的55个用户，另外有52位用户。我们能够降低流媒体和非启动模型中的两个集合中每个用户的平均单词错误率，包括在流式传输时为更难的测试用户组的2.5改进。

translated by 谷歌翻译